Search | WHO COVID-19 Research Database

Using Machine Learning to Enhance Archival Processing of Social Media Archives

Fan, L. Z.; Yin, Z. Y.; Yu, H. Z.; Gilliland, A. J..

Acm Journal on Computing and Cultural Heritage ; 15(3), 2022.

Article in English | Web of Science | ID: covidwho-2162009

ABSTRACT

This article reports on a study using machine learning to identify incidences and shifting dynamics of hate speech in social media archives. To better cope with the archival processing need for such large-scale and fast evolving archives, we propose the Data-driven and Circulating Archival Processing (DCAP) method. As a proof-of-concept, our study focuses on an English language Twitter archive relating to COVID-19: Tweets were repeatedly scraped between February and June 2020, ingested and aggregated within the COVID-19 Hate Speech Twitter Archive (CHSTA), and analyzed for hate speech using the Generative Adversarial Network-inspired DCAP method. Outcomes suggest that it is possible to use machine learning and data analytics to surface and substantiate trends from CHSTA and similar social media archives that could provide immediately useful knowledge for crisis response, in controversial situations, or for public policy development, as well as for subsequent historical analysis. The approach shows potential for integrating multiple aspects of the archival workflow and supporting automatic iterative redescription and reappraisal activities in ways that make them more accountable and more rapidly responsive to changing societal interests and unfolding developments.

Using a Three-step Social Media Similarity (TSMS) Mapping Method to Analyze Controversial Speech Relating to COVID-19 in Twitter Collections

Yin, Z.; Fan, L.; Yu, H.; Gilliland, A. J..

Proc. - IEEE Int. Conf. Big Data, Big Data ; : 1949-1953, 2020.

Article in English | Scopus | ID: covidwho-1186038

ABSTRACT

Addressing increasing calls to surface hidden and counter-narratives from within archival collections, this paper reports on a study that provides proof-of-concept of automatic methods that could be used on archived social media collections. Using a test collection of 3,457,434 unique tweets relating to COVID-19, China and Chinese people, it sought to identify instances of Hate Speech as well as hard-to-pinpoint trends in anti-Chinese racist sentiment. The study, part of a larger archival research effort investigating automatic methods for appraisal and description of very large digital archival collections, used a Three-step Social Media Similarity (TSMS) mapping method that aggregates hashtag mapping, TF-IDF Similarity Selection, and Emotion Similarity Calculation on the test collection. Compared to using a purely lexicon-based method to identify and analyze controversial speech, this method successfully expanded the amount of controversial contents detected from 21,050 tweets to 212,605, and the detection rate from 0.6% to 6.1%. We argue that the TSMS method could be similarly applied by archives in automatically identifying, analyzing, describing other controversial content on social media and in other rapidly evolving and complex contexts in order to increase public awareness and facilitate public policy responses. © 2020 IEEE.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL